Energy contour extraction for in-car speech recognition
نویسنده
چکیده
The time derivatives of speech energy, such as the delta and the delta-delta log energy, have been known as critical features for automatic speech recognition (ASR). However, their discriminative ability in lower signal-to-noise ratio (SNR) could be limited or even becomes harmful because of the corruption of energy contour. By taking the advantage of the spectral characteristic of in-car noise, the speech energy contour is extracted from the high-pass filtered signal so as to reduce the distortion in the delta energy. Such filtering can be implemented by using a pre-emphasis-like filter or a summation of higher frequency band energies. A Chinese name recognition task is conducted to evaluate the proposed method by using real in-car speech and artificially generated one as the test data. As shown in the experimental results, the method is capable of improving the recognition accuracy of in-car speech in lower SNR as well as of the clean speech.
منابع مشابه
Improved Melody Recognition Performance of a Cochlear Implant Speech Processing Strategy Using Instantaneous Frequency Encoding Based on Teager Energy Operator
We present a speech processing strategy incorporating instantaneous frequency (IF) encoding for the enhancement of melody recognition performance of cochlear implants. For the IF extraction from incoming sound, we propose the use of a Teager energy operator (TEO), which is advantageous for its lower computational load. From time-frequency analysis, we verified that the TEO-based method provides...
متن کاملClassification of taiwanese tones based on pitch and energy movements
This paper addresses the difficulties associated with automatically distinguishing the seven Taiwanese tones. The tone recogniser is an essential component of any automatic speech recognition system customised for tone languages such as Taiwanese. We show that it is difficult to distinguish between the Taiwanese tones simply employing the fundamental frequency contours and that the task is simp...
متن کاملTowards High Performance Phonotactic Feature for Spoken Language Recognition
With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language tr...
متن کاملThe Lombard Effect in Spontaneous Dialog Speech
The Lombard effect – environmental noise affects speech production – has already been studied extensively for read lab speech. In this study spontaneous dialog speech produced by 24 German speakers has been recorded under noisy conditions and analysed for the Lombard effect. A sophisticated experimental setup using behind-the-ear hearing aid equipment allows us to insert real car noise into the...
متن کاملA Survey – Audio and Video Synchronization
The audio and video Synchronization is extremely necessary. The synchronization loss between image and sound continues to disturb observers and irritate telecasters. The demand is to assure synchronization without adjusting content at the same time as still retaining price low. The objective of the synchronization is to line up both the audio and video signals that are processed individually. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003